Search | VHL Regional Portal

Unexpected features of the dark proteome.

Perdigão, Nelson; Heinrich, Julian; Stolte, Christian; Sabir, Kenneth S; Buckley, Michael J; Tabor, Bruce; Signal, Beth; Gloss, Brian S; Hammang, Christopher J; Rost, Burkhard; Schafferhans, Andrea; O'Donoghue, Seán I.

Proc Natl Acad Sci U S A ; 112(52): 15898-903, 2015 Dec 29.

Article in English | MEDLINE | ID: mdl-26578815

ABSTRACT

We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only â¼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.

Subject(s)

Computational Biology/methods , Databases, Protein , Proteins/metabolism , Proteome/metabolism , Algorithms , Animals , Archaea/genetics , Archaea/metabolism , Bacteria/genetics , Bacteria/metabolism , Eukaryota/metabolism , Humans , Models, Molecular , Protein Conformation , Proteins/chemistry , Proteins/genetics , Proteome/chemistry , Proteome/genetics , Viruses/genetics , Viruses/metabolism

Integrated visual analysis of protein structures, sequences, and feature data.

Stolte, Christian; Sabir, Kenneth S; Heinrich, Julian; Hammang, Christopher J; Schafferhans, Andrea; O'Donoghue, Seán I.

BMC Bioinformatics ; 16 Suppl 11: S7, 2015.

Article in English | MEDLINE | ID: mdl-26329268

ABSTRACT

BACKGROUND: To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. RESULTS: To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. CONCLUSIONS: The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.

Subject(s)

Amyloid beta-Protein Precursor/chemistry , Computational Biology/methods , Computer Graphics , Sequence Analysis, Protein/methods , Software , src-Family Kinases/chemistry , Amyloid beta-Protein Precursor/metabolism , B-Lymphocytes/metabolism , Databases, Protein , Humans , Protein Conformation , Protein Processing, Post-Translational , src-Family Kinases/metabolism

Aquaria: simplifying discovery and insight from protein structures.

O'Donoghue, Seán I; Sabir, Kenneth S; Kalemanov, Maria; Stolte, Christian; Wellmann, Benjamin; Ho, Vivian; Roos, Manfred; Perdigão, Nelson; Buske, Fabian A; Heinrich, Julian; Rost, Burkhard; Schafferhans, Andrea.

Nat Methods ; 12(2): 98-9, 2015 Feb.

Article in English | MEDLINE | ID: mdl-25633501

Subject(s)

Databases, Protein , Proteins/chemistry , Amino Acid Sequence , Molecular Sequence Data , Protein Conformation

How to learn about gene function: text-mining or ontologies?

Soldatos, Theodoros G; Perdigão, Nelson; Brown, Nigel P; Sabir, Kenneth S; O'Donoghue, Seán I.

Methods ; 74: 3-15, 2015 Mar.

Article in English | MEDLINE | ID: mdl-25088781

ABSTRACT

As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.

Subject(s)

Data Mining/methods , Databases, Genetic , Gene Ontology , Animals , Data Mining/trends , Databases, Genetic/trends , Gene Ontology/trends , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL